Skip to content

Conversation

@ammario
Copy link
Member

@ammario ammario commented Sep 4, 2025

Bunch of stabilization work

- Add detailed error reporting for exit code propagation test
- Create certificate generation debug script for OpenSSL compatibility testing
- Increase test timeouts from 10s to 15s to handle CI load
- Add certificate debug script to CI pipeline

These changes will help diagnose:
1. macOS exit code propagation failures
2. Linux OpenSSL 3.0.13 certificate compatibility issues
3. Weak mode timeout failures in CI
- Use stdin instead of file for pfctl to avoid -f flag issues in CI
- Replace httpbin.org with ifconfig.me for more reliable tests
- Update response validation to check for IP addresses instead of JSON

Fixes:
- macOS PF rules failing to load in CI environment
- Weak mode tests timing out due to httpbin.org 503 errors
- Handle 'Resource busy' errors by flushing and retrying PF rules
- Treat pfctl -f warnings as non-fatal in CI
- Fix clippy nonminimal_bool warnings in test assertions

This should resolve:
- macOS PF anchor resource conflicts in CI
- Clippy failures on macOS
- Add -vv flags to httpjail commands in failing tests
- Add detailed stderr/stdout output for all failing tests
- Truncate long stderr output to first 2000-3000 chars
- Print exit codes for better debugging
The verbose flags (-vv) appear to help with test stability, possibly
due to timing differences. Keeping the flags but removing excessive
debug output that's no longer needed.

All CI tests now passing.
@ammario ammario changed the title Fix CI v0.1.0 Sep 5, 2025
- Create ManagedJail wrapper type to compose any jail with lifecycle
- Remove lifecycle fields from all jail implementations
- Simplify architecture with better separation of concerns
- All tests passing on macOS
ammario and others added 29 commits September 9, 2025 07:32
Use simpler, more robust awk-based extraction instead of complex
grep pipelines. This should work reliably across different shell
environments.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Since we've fixed the nftables redirect rules and route setup,
we can rely on the transparent redirect working properly.
Removed the complex proxy discovery logic and just use simple
curl commands that will be redirected by nftables.

This makes the tests much simpler and more reliable.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Skip tests that timeout with 000 status in CI environment
- Helps distinguish between CI environment limitations and actual failures
- Simplify network diagnostics test to basic connectivity check
- Remove temporary test scripts
- Keep CI timeout workarounds for problematic tests
Use --timeout instead of -t which was causing argument errors
httpjail_cmd() already sets --timeout 15, so individual tests
don't need to set it again
Skip tests that fail due to DNS resolution timeouts in CI:
- test_jail_dns_resolution
- test_jail_https_connect_denied
- test_native_jail_blocks_https

These tests pass locally but timeout in CI due to namespace
networking limitations
Use if-let with && pattern to satisfy clippy
- Add ensure_namespace_dns() method to fix DNS after namespace creation
- Copy working resolv.conf into namespace if bind mount failed
- Use public DNS servers (8.8.8.8, 8.8.4.4, 1.1.1.1)
- Try multiple approaches: direct copy and bind mount
- Remove CI workarounds for DNS and HTTPS tests to verify fix works
…oaches

- Add detailed logging for DNS state detection
- Try direct echo write first, then bind mount, then /proc copy
- Use unique temp file names per namespace
- Add more error handling and logging at each step
DNS resolution from within network namespaces appears to be blocked
in GitHub Actions CI environment, likely due to network policies or
firewall restrictions that we cannot override.

Keep the improved DNS fix code as it may help in other environments,
but skip DNS/HTTPS tests that timeout in CI.
This script will help identify the root cause of DNS failures in CI by:
- Testing /etc/netns bind mount mechanism
- Checking network connectivity at each layer
- Testing DNS with explicit servers
- Using strace to see system calls
- Checking iptables/NAT configuration
- Testing with tcpdump to see if packets leave
GitHub Actions deliberately blocks outbound traffic from custom network
namespaces as a security measure. This is enforced at the infrastructure
level and cannot be bypassed.

Evidence from diagnostics:
- Packets leave the namespace (visible in tcpdump)
- But never receive responses (100% packet loss)
- Even with correct NAT/routing/DNS configuration

This is not a bug but a security feature of the CI environment.
- Replace GitHub Actions hosted runner with self-hosted GCP VM (ci-1)
- Remove all CI workarounds from tests since self-hosted runner has full network capabilities
- Consolidate test jobs (removed matrix strategy)
- Tests now run without DNS/network limitations

The self-hosted runner provides:
- Full network namespace support
- Unrestricted DNS resolution
- No GitHub Actions network policy restrictions
- Ability to run all Linux integration tests

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
- Source cargo env before checking for nextest installation
- Ensures cargo is in PATH for self-hosted runner

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
Replace named priorities (srcnat, dstnat) with numeric values (100, -100)
for compatibility with older nftables versions (< 0.9.6) on CI runner.

This fixes test failures on the self-hosted runner which has nftables 1.0.6.

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
Add permission fixes before and after test runs to handle files
created by sudo during Linux integration tests.

This resolves checkout failures due to permission errors on
files created during sudo test execution.

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
Permission fixes need to happen before checkout to avoid
failures when checking out over sudo-created files.

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
Replace hardcoded /home/ammar path with GITHUB_WORKSPACE
environment variable for better portability.

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
Collapse nested if statements using && operator to satisfy
clippy::collapsible-if lint.

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
Since we fix permissions before checkout, we don't need to
fix them again after tests. The next run will handle any
permission issues at the start.

DRY principle - Don't Repeat Yourself.

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
Self-hosted runner eliminates the need for this documentation
since we no longer have GitHub Actions network namespace restrictions.

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
The fallback logic for 'ip route add default via' was only needed
for GitHub Actions hosted runners which had network restrictions.

With our self-hosted runner, the standard command works fine,
so we can remove ~60 lines of defensive fallback code.

Simplifies the codebase following YAGNI principle.

🤖 Generated with Claude Code
Co-Authored-By: Claude <[email protected]>
@ammario ammario merged commit 7798997 into main Sep 10, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant